Bio-Inspired Metaheuristic Optimization Algorithms for Biomarker Identification in Mass Spectrometry Analysis

نویسندگان

Syarifah Adilah Mohamed Yusoff

Ibrahim Venkat

Umi Kalsom Yusof

Rosni Abdullah

چکیده

Mass spectrometry is an emerging technique that is continuously gaining momentum among bioinformatics researchers who intend to study biological or chemical properties of complex structures such as protein sequences. This advancement also embarks in the discovery of proteomic biomarkers through accessible body fluids such as serum, saliva, and urine. Recently, literature reveals that sophisticated computational techniques mimetic survival and natural processes adapted from biological life for reasoning voluminous mass spectrometry data yields promising results. Such advanced approaches can provide efficient ways to mine mass spectrometry data in order to extract parsimonious features that represent vital information, specifically in discovering disease-related protein patterns in complex proteins sequences. This article intends to provide a systematic survey on bio-inspired approaches for feature subset selection via mass spectrometry data for biomarker analysis. DOI: 10.4018/jncr.2012040104 International Journal of Natural Computing Research, 3(2), 64-85, April-June 2012 65 Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. such as serum, urine, nipple aspirate fluids and so on. This valuable information paves upon the exploration of facts in proteomics studies viz., characterization of regulatory and functional networks, investigation of precious molecular defect in biological fluids and identification of symptoms of various stages of a disease via development of reagents (Celis & Gromov, 2003). Apart from such valuable explorations, it also provides functional insight pertaining to the development of clinically significant drugs. Basically the output of any typical Mass Spectrometry (MS) analysis yields a spectrum, which can be represented as a typical xy-graph in terms of ratio of mass to charge ratio (m/z) versus ionization intensities. Significant information of the spectrum comprises of peaks of the intensities with proportional m/z values. Concerning to intensities of peaks that represent protein expression level for certain molecules of peptides, it leads on discovering new biomarkers for a particular disease in different stages. However MS data bears high dimensionality and makes significant numbers of m/z values are correlated or noisy. It implicitly demands the application of robust pattern recognition techniques that can cope up with large amounts of redundant data. Feature selection, a process of selecting a subset of original features according to certain criteria, is an important and frequently used dimensionality reduction technique for data mining (Guyon & Elisseeff, 2003; Liu & Motoda, 1998). It reduces the number of features, removes irrelevant, redundant, or noisy data, and brings the immediate effects for applications: thereby speeding up data mining algorithms, and improving mining performance such as predictive accuracy and comprehensibility of results. In biological context, the technique is also called as discriminative gene selection, which detects influential genes based on DNA micro-array experiments. In MS analysis, feature selection plays two vital roles; (1) It aids to construct a feature selection search, which seeks for significant features to discriminate diseases from control samples; and (2) It helps to construct an appropriate classification model that enables the identification of potential biomarkers for further analysis. In general, algorithms pertaining to feature selection can be typically classified into two categories viz., feature ranking and subset selection. Feature ranking uses all features inherent on the datasets based on primarily rank-listing them using a metric and then discarding those features that falls below a predefined threshold. The threshold is usually set as a substantial score derived from the ranks. In contrast, subset selection searches the set of possible features for the optimal subset. That is, it evaluates a subset of features as a group for suitability. Further, subset selection algorithms can be classified into three categories viz.: Wrappers, Filters and Embedded (Guyon & Elisseeff, 2003). Wrappers and filters are both most popular feature subset methods applied in order to achieve dimensionality reduction. Wrappers use a search algorithm to search through the space of possible features and evaluate each subset by running a learning model on the subset. Wrappers can be computationally expensive and have a risk of over fitting to the model. However, this drawback can be reduced by injecting some heuristic techniques in the search process to achieve an optimal subset and apply cross-validation techniques to avoid over fitting. Filters are similar to wrappers in the search approach, but instead of evaluating against a model, a simpler filter-based strategy is evaluated. Filter-based feature ranking techniques rank features independently without the involvement of any learning algorithms. Feature ranking consists of scoring each feature according to a particular method, and then selecting features based on their scores. Filter methods are the most commonly applied techniques in bioinformatics studies since they have proven to be computationally simple, fast and independent of other analysis algorithms. Also they allow features to be quantified and prioritized according to the scores, which is particularly important for biological interpretation. Their main drawback is that they are not optimized to be used with a particular classifier as they are completely independent of the classification 20 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/bio-inspired-metaheuristicoptimization-algorithms/73014?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Medicine, Healthcare, and Life Science. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation on Bio-Inspired Population Based Metaheuristic Algorithms for Optimization Problems in Ad Hoc Networks

Nature is a great source of inspiration for solving complex problems in networks. It helps to find the optimal solution. Metaheuristic algorithm is one of the nature-inspired algorithm which helps in solving routing problem in networks. The dynamic features, changing of topology frequently and limited bandwidth make the routing, challenging in MANET. Implementation of appropriate routing algori...

متن کامل

Investigation on Bio-Inspired Population Based Metaheuristic Algorithms for Optimization Problems in Ad Hoc Networks

متن کامل

A Brief Review of Nature-Inspired Algorithms for Optimization

Swarm-intelligence-based and bio-inspired algorithms form a hot topic in the developments of new algorithms inspired by nature. These nature-inspired metaheuristic algorithms can be based on swarm intelligence, biological systems, physical and chemical systems. Therefore, these algorithms can be called swarm-intelligence-based, bio-inspired, physicsand chemistry-based, depending on the sources ...

متن کامل

Firefly Algorithm for Economic Power Dispatching With Pollutants Emission

Bio-inspired algorithms become among the most powerful algorithms for optimization. In this paper, we intend to provide one of the recent bio-inspired metaheuristic which is the Firefly Algorithm (FF) to optimize power dispatching. For evaluation, we adapt the particle swarm optimization to the problem in the same way as the firefly algorithm. The application is done in an IEEE-14 and on two th...

متن کامل

IIR System Identification Using Improved Harmony Search Algorithm with Chaos

Due to the fact that the error surface of adaptive infinite impulse response (IIR) systems is generally nonlinear and multimodal, the conventional derivative based techniques fail when used in adaptive identification of such systems. In this case, global optimization techniques are required in order to avoid the local minima. Harmony search (HS), a musical inspired metaheuristic, is a recently ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJNCR

دوره 3 شماره

صفحات -

تاریخ انتشار 2012

Bio-Inspired Metaheuristic Optimization Algorithms for Biomarker Identification in Mass Spectrometry Analysis

نویسندگان

چکیده

منابع مشابه

Investigation on Bio-Inspired Population Based Metaheuristic Algorithms for Optimization Problems in Ad Hoc Networks

Investigation on Bio-Inspired Population Based Metaheuristic Algorithms for Optimization Problems in Ad Hoc Networks

A Brief Review of Nature-Inspired Algorithms for Optimization

Firefly Algorithm for Economic Power Dispatching With Pollutants Emission

IIR System Identification Using Improved Harmony Search Algorithm with Chaos

عنوان ژورنال:

اشتراک گذاری